Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Prediction of Selection Decision of Document Using Bibliographic Data at the National Library of France (BnF)

Identifieur interne : 000227 ( Main/Exploration ); précédent : 000226; suivant : 000228

Prediction of Selection Decision of Document Using Bibliographic Data at the National Library of France (BnF)

Auteurs : Ahmed Ben Salah [France] ; Geneviève Cron [France] ; Nicolas Ragot [France] ; Thierry Paquet [France]

Source :

RBID : Hal:hal-00737893

English descriptors

Abstract

The selection process of the documents is a very important step in mass digitization projects. This is especially true at the BnF, where the digitization should include or not OCRization depending on the OCR results expected. Consequently, the selection task is very complex and time consuming due to the number of documents to be processed and the diversity of the selection criteria to consider. Trying to improve and simplify this task by automation, we studied the relationship between bibliographic data and the selection decisions of documents. We used two statistical analysis : a factor analysis of correspondence and a multiple correspondence analysis. Our analysis has shown that, for example, the documents in format "4 or GR FOL" and edited "between 1961 and 1990" in Morocco are more likely to be "Selected". However, the documents in format "16 or 8" and edited "between 1871 and 1800 in English or Spanish have a greater chance to be "Not Selected".

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Prediction of Selection Decision of Document Using Bibliographic Data at the National Library of France (BnF)</title>
<author>
<name sortKey="Ben Salah, Ahmed" sort="Ben Salah, Ahmed" uniqKey="Ben Salah A" first="Ahmed" last="Ben Salah">Ahmed Ben Salah</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-399419" status="INCOMING">
<orgName>DocApp et Rfai</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-23832" type="direct"></relation>
<relation active="#struct-301288" type="indirect"></relation>
<relation active="#struct-301232" type="indirect"></relation>
<relation name="EA4108" active="#struct-300318" type="indirect"></relation>
<relation active="#struct-300317" type="indirect"></relation>
<relation active="#struct-203066" type="direct"></relation>
<relation active="#struct-302209" type="indirect"></relation>
<relation active="#struct-204893" type="direct"></relation>
<relation name="EA6300" active="#struct-300298" type="indirect"></relation>
<relation active="#struct-300408" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-23832" type="direct">
<org type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301288" type="indirect">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="indirect">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300317" type="indirect">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-203066" type="direct">
<org type="laboratory" xml:id="struct-203066" status="VALID">
<orgName>Bibliothèque nationale de France, Délégation à la Stratégie et à la recherche</orgName>
<orgName type="acronym">BnF_DSG</orgName>
<desc>
<address>
<addrLine>Quai François Mauriac, 75706 Paris cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.bnf.fr/fr/la_bnf/strategie_recherche.html</ref>
</desc>
<listRelation>
<relation active="#struct-302209" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-302209" type="indirect">
<org type="institution" xml:id="struct-302209" status="VALID">
<orgName>Bibliothèque Nationale de France</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-204893" type="direct">
<org type="laboratory" xml:id="struct-204893" status="VALID">
<orgName>Laboratoire d'Informatique de l'Université de Tours</orgName>
<orgName type="acronym">LI</orgName>
<desc>
<address>
<addrLine>64, Avenue Jean Portalis, 37200 Tours</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.li.univ-tours.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-300408" type="direct"></relation>
<relation name="EA6300" active="#struct-300298" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="EA6300" active="#struct-300298" type="indirect">
<org type="institution" xml:id="struct-300298" status="VALID">
<orgName>Université François Rabelais - Tours</orgName>
<desc>
<address>
<addrLine>60 rue du Plat d'Étain, 37020 Tours cedex 1 </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-tours.fr</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300408" type="indirect">
<org type="institution" xml:id="struct-300408" status="VALID">
<orgName>Polytech'Tours</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Bourgogne</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
<placeName>
<settlement type="city">Tours</settlement>
<region type="old region" nuts="2">Région Centre</region>
<region type="region" nuts="2">Centre-Val de Loire</region>
</placeName>
<orgName type="university">Université François-Rabelais de Tours</orgName>
<orgName type="institution" wicri:auto="newGroup">Centre Val de Loire Université</orgName>
</affiliation>
</author>
<author>
<name sortKey="Cron, Genevieve" sort="Cron, Genevieve" uniqKey="Cron G" first="Geneviève" last="Cron">Geneviève Cron</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-399420" status="INCOMING">
<orgName>Service numérisation</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-201530" type="direct"></relation>
<relation active="#struct-300125" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-201530" type="direct">
<org type="laboratory" xml:id="struct-201530" status="VALID">
<orgName>Bibliothèque nationale de France</orgName>
<orgName type="acronym">BnF</orgName>
<desc>
<address>
<addrLine>Quai François Mauriac, 75706 Paris Cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.bnf.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-300125" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300125" type="indirect">
<org type="institution" xml:id="struct-300125" status="VALID">
<orgName>Ministère de la Culture et de la Communication</orgName>
<orgName type="acronym">MCC</orgName>
<desc>
<address>
<addrLine>3, rue de Valois 75001 Paris</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Ragot, Nicolas" sort="Ragot, Nicolas" uniqKey="Ragot N" first="Nicolas" last="Ragot">Nicolas Ragot</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-204893" status="VALID">
<orgName>Laboratoire d'Informatique de l'Université de Tours</orgName>
<orgName type="acronym">LI</orgName>
<desc>
<address>
<addrLine>64, Avenue Jean Portalis, 37200 Tours</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.li.univ-tours.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-300408" type="direct"></relation>
<relation name="EA6300" active="#struct-300298" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300408" type="direct">
<org type="institution" xml:id="struct-300408" status="VALID">
<orgName>Polytech'Tours</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA6300" active="#struct-300298" type="direct">
<org type="institution" xml:id="struct-300298" status="VALID">
<orgName>Université François Rabelais - Tours</orgName>
<desc>
<address>
<addrLine>60 rue du Plat d'Étain, 37020 Tours cedex 1 </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-tours.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Tours</settlement>
<region type="old region" nuts="2">Région Centre</region>
<region type="region" nuts="2">Centre-Val de Loire</region>
</placeName>
<orgName type="university">Université François-Rabelais de Tours</orgName>
<orgName type="institution" wicri:auto="newGroup">Centre Val de Loire Université</orgName>
</affiliation>
</author>
<author>
<name sortKey="Paquet, Thierry" sort="Paquet, Thierry" uniqKey="Paquet T" first="Thierry" last="Paquet">Thierry Paquet</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-389520" status="INCOMING">
<orgName>DOCAPP</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-23832" type="direct"></relation>
<relation active="#struct-300317" type="indirect"></relation>
<relation name="EA4108" active="#struct-300318" type="indirect"></relation>
<relation active="#struct-301288" type="indirect"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-23832" type="direct">
<org type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300317" type="indirect">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="indirect">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="indirect">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Bourgogne</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-00737893</idno>
<idno type="halId">hal-00737893</idno>
<idno type="halUri">https://hal-bnf.archives-ouvertes.fr/hal-00737893</idno>
<idno type="url">https://hal-bnf.archives-ouvertes.fr/hal-00737893</idno>
<date when="2012-06-12">2012-06-12</date>
<idno type="wicri:Area/Hal/Corpus">000099</idno>
<idno type="wicri:Area/Hal/Curation">000099</idno>
<idno type="wicri:Area/Hal/Checkpoint">000065</idno>
<idno type="wicri:Area/Main/Merge">000231</idno>
<idno type="wicri:Area/Main/Curation">000227</idno>
<idno type="wicri:Area/Main/Exploration">000227</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Prediction of Selection Decision of Document Using Bibliographic Data at the National Library of France (BnF)</title>
<author>
<name sortKey="Ben Salah, Ahmed" sort="Ben Salah, Ahmed" uniqKey="Ben Salah A" first="Ahmed" last="Ben Salah">Ahmed Ben Salah</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-399419" status="INCOMING">
<orgName>DocApp et Rfai</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-23832" type="direct"></relation>
<relation active="#struct-301288" type="indirect"></relation>
<relation active="#struct-301232" type="indirect"></relation>
<relation name="EA4108" active="#struct-300318" type="indirect"></relation>
<relation active="#struct-300317" type="indirect"></relation>
<relation active="#struct-203066" type="direct"></relation>
<relation active="#struct-302209" type="indirect"></relation>
<relation active="#struct-204893" type="direct"></relation>
<relation name="EA6300" active="#struct-300298" type="indirect"></relation>
<relation active="#struct-300408" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-23832" type="direct">
<org type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301288" type="indirect">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="indirect">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300317" type="indirect">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-203066" type="direct">
<org type="laboratory" xml:id="struct-203066" status="VALID">
<orgName>Bibliothèque nationale de France, Délégation à la Stratégie et à la recherche</orgName>
<orgName type="acronym">BnF_DSG</orgName>
<desc>
<address>
<addrLine>Quai François Mauriac, 75706 Paris cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.bnf.fr/fr/la_bnf/strategie_recherche.html</ref>
</desc>
<listRelation>
<relation active="#struct-302209" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-302209" type="indirect">
<org type="institution" xml:id="struct-302209" status="VALID">
<orgName>Bibliothèque Nationale de France</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-204893" type="direct">
<org type="laboratory" xml:id="struct-204893" status="VALID">
<orgName>Laboratoire d'Informatique de l'Université de Tours</orgName>
<orgName type="acronym">LI</orgName>
<desc>
<address>
<addrLine>64, Avenue Jean Portalis, 37200 Tours</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.li.univ-tours.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-300408" type="direct"></relation>
<relation name="EA6300" active="#struct-300298" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="EA6300" active="#struct-300298" type="indirect">
<org type="institution" xml:id="struct-300298" status="VALID">
<orgName>Université François Rabelais - Tours</orgName>
<desc>
<address>
<addrLine>60 rue du Plat d'Étain, 37020 Tours cedex 1 </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-tours.fr</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300408" type="indirect">
<org type="institution" xml:id="struct-300408" status="VALID">
<orgName>Polytech'Tours</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Bourgogne</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
<placeName>
<settlement type="city">Tours</settlement>
<region type="old region" nuts="2">Région Centre</region>
<region type="region" nuts="2">Centre-Val de Loire</region>
</placeName>
<orgName type="university">Université François-Rabelais de Tours</orgName>
<orgName type="institution" wicri:auto="newGroup">Centre Val de Loire Université</orgName>
</affiliation>
</author>
<author>
<name sortKey="Cron, Genevieve" sort="Cron, Genevieve" uniqKey="Cron G" first="Geneviève" last="Cron">Geneviève Cron</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-399420" status="INCOMING">
<orgName>Service numérisation</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-201530" type="direct"></relation>
<relation active="#struct-300125" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-201530" type="direct">
<org type="laboratory" xml:id="struct-201530" status="VALID">
<orgName>Bibliothèque nationale de France</orgName>
<orgName type="acronym">BnF</orgName>
<desc>
<address>
<addrLine>Quai François Mauriac, 75706 Paris Cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.bnf.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-300125" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300125" type="indirect">
<org type="institution" xml:id="struct-300125" status="VALID">
<orgName>Ministère de la Culture et de la Communication</orgName>
<orgName type="acronym">MCC</orgName>
<desc>
<address>
<addrLine>3, rue de Valois 75001 Paris</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Ragot, Nicolas" sort="Ragot, Nicolas" uniqKey="Ragot N" first="Nicolas" last="Ragot">Nicolas Ragot</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-204893" status="VALID">
<orgName>Laboratoire d'Informatique de l'Université de Tours</orgName>
<orgName type="acronym">LI</orgName>
<desc>
<address>
<addrLine>64, Avenue Jean Portalis, 37200 Tours</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.li.univ-tours.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-300408" type="direct"></relation>
<relation name="EA6300" active="#struct-300298" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300408" type="direct">
<org type="institution" xml:id="struct-300408" status="VALID">
<orgName>Polytech'Tours</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA6300" active="#struct-300298" type="direct">
<org type="institution" xml:id="struct-300298" status="VALID">
<orgName>Université François Rabelais - Tours</orgName>
<desc>
<address>
<addrLine>60 rue du Plat d'Étain, 37020 Tours cedex 1 </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-tours.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Tours</settlement>
<region type="old region" nuts="2">Région Centre</region>
<region type="region" nuts="2">Centre-Val de Loire</region>
</placeName>
<orgName type="university">Université François-Rabelais de Tours</orgName>
<orgName type="institution" wicri:auto="newGroup">Centre Val de Loire Université</orgName>
</affiliation>
</author>
<author>
<name sortKey="Paquet, Thierry" sort="Paquet, Thierry" uniqKey="Paquet T" first="Thierry" last="Paquet">Thierry Paquet</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-389520" status="INCOMING">
<orgName>DOCAPP</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-23832" type="direct"></relation>
<relation active="#struct-300317" type="indirect"></relation>
<relation name="EA4108" active="#struct-300318" type="indirect"></relation>
<relation active="#struct-301288" type="indirect"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-23832" type="direct">
<org type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300317" type="indirect">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="indirect">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="indirect">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Bourgogne</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="mix" xml:lang="en">
<term>Correspondence Factor Analysis</term>
<term>Data analysis</term>
<term>Multiple Correspondence Analysis</term>
<term>Optical Character Recognition</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The selection process of the documents is a very important step in mass digitization projects. This is especially true at the BnF, where the digitization should include or not OCRization depending on the OCR results expected. Consequently, the selection task is very complex and time consuming due to the number of documents to be processed and the diversity of the selection criteria to consider. Trying to improve and simplify this task by automation, we studied the relationship between bibliographic data and the selection decisions of documents. We used two statistical analysis : a factor analysis of correspondence and a multiple correspondence analysis. Our analysis has shown that, for example, the documents in format "4 or GR FOL" and edited "between 1961 and 1990" in Morocco are more likely to be "Selected". However, the documents in format "16 or 8" and edited "between 1871 and 1800 in English or Spanish have a greater chance to be "Not Selected".</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Centre-Val de Loire</li>
<li>Région Bourgogne</li>
<li>Région Centre</li>
</region>
<settlement>
<li>Rouen</li>
<li>Tours</li>
</settlement>
<orgName>
<li>Centre Val de Loire Université</li>
<li>Université François-Rabelais de Tours</li>
<li>Université de Rouen</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Région Bourgogne">
<name sortKey="Ben Salah, Ahmed" sort="Ben Salah, Ahmed" uniqKey="Ben Salah A" first="Ahmed" last="Ben Salah">Ahmed Ben Salah</name>
</region>
<name sortKey="Cron, Genevieve" sort="Cron, Genevieve" uniqKey="Cron G" first="Geneviève" last="Cron">Geneviève Cron</name>
<name sortKey="Paquet, Thierry" sort="Paquet, Thierry" uniqKey="Paquet T" first="Thierry" last="Paquet">Thierry Paquet</name>
<name sortKey="Ragot, Nicolas" sort="Ragot, Nicolas" uniqKey="Ragot N" first="Nicolas" last="Ragot">Nicolas Ragot</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000227 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000227 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Hal:hal-00737893
   |texte=   Prediction of Selection Decision of Document Using Bibliographic Data at the National Library of France (BnF)
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024